T 


fil20  986 


RN  INFERENTIAL  APPROACH  TO  THE  BIOASSAV  DESIGN  PROBLEH  1/1 
(U)  NISCONSIN  UNIV-MADISON  MATHEMATICS  RESEARCH  CENTER 
T  LEONARD  SEP  82  HRC-TSR-2416  DAHG29-80-C-8841 

F/G  6/1 


1 


UNCLASSIFIED 


NL 


MRC  Technical  Summary  Report  #2416 


AN  INFERENTIAL  APPROACH  TO  THE 
BIOASSAY  DESIGN  PROBLEM 


Tom  Leonard 


Mathematics  Research  Center 
University  of  Wisconsin-Madison 
610  Walnut  Street 
Madison,  Wisconsin  53706 

September  1982 

(Received  June  15,  1982) 


/~S  :r' 

f  V 

VV'  . 

.O  ❖ 

y .  \ 


n%  :•:/ 


0IIC  FILE  COPY 


Approved  for  public  release 
Distribution  unlimited 


Sponsored  by 

U.  S.  Army  Research  Office 
P.  O.  Box  12211 
Research  Triangle  Park 
North  Carolina  27709 


82 


02  05  5 


UNIVERSITY  OF  WISCONSIN-MADISON 
MATHEMATICS  RESEARCH  CENTER 

AN  INFERENTIAL  APPROACH  TO  THE  BIOASSAY  DESIGN  PROBLEM 

Tom  Leonard 

Technical  Summary  Report  #2416 
September  1982 

ABSTRACT 

The  Bioassay  design  problem  may  usefully  be  considered  within  an 
inferential  framework,  rather  than  by  reference  to  a  formal  decision  theoretic 
procedure  based  upon  a  number  of  special  assumptions.  Three  graphical 
techniques  are  described  to  assist  the  user's  selection  of  new  design 
points.  Firstly,  a  plot,  against  dose  level,  of  the  predictive  probability  of 
the  death  of  the  next  rat  will  help  the  user  to  choose  design  points  relating 
to  particular  regions  of  LD  values;  comparison  with  the  maximum  likelihood 
estimate  of  the  response  curve  leads  to  informal  stopping  rules.  Secondly, 
new  approximations,  to  the  posterior  density  of  the  effective  dose,  are 
proposed,  for  each  LD  value.  These  are  related  to  the  marginal  likelihood 
ideas  of  Sprott  and  Kalbfleisch.  Thirdly,  mixtures  of  these  densities  leads 
to  design  measures  for  the  distribution  of  future  dose  levels.  These  seem  to 
make  criteria  like  D-optimality  rather  tangential  to  the  real  design  issue. 

The  ideas  are  illustrated  graphically  by  reference  to  a  fertility  example  due 
to  Bliss. 


AMS  (MOS)  Subject  Classification:  62P10,  62F15 

Key  Words:  Bioassay,  Design,  Predictive  distribution.  Response  curve. 
Posterior  distribution  of  effective  dose.  Design  measure, 
D-optimality. 

Work  Unit  Number  4  (Statistics  and  Probability) 


Sponsored  by  the  United  States  Army  under  Contract  No.  DAAG29-80-C-0041 


SIGNIFICANCE  AND  EXPLANATION 


The  bioassay  design  problem  relates  to  a  variety  o£  practical  solutions 
where  there  are  zero-one  responses  which  are  regressed  upon  an  explanatory 
variable.  The  problem  addressed  is  "how  do  we  choose  the  next  few  dose 
levels/  given  a  few  preliminary  experiments?"  Three  graphical  procedures  are 
proposed  including  (i)  a  plot  of  the  predictive  response  curve  (ii)  the 
posterior  densities  of  effective  doses  for  given  design  measures  and  (iii)  a 
sensible  choice  of  design  measure  which  averages  the  posterior  densities  of 
the  effective  doses  over  LD  values  1/...,99.  A  rat  fertility  experiment  is 
analyzed  to  illustrate  these  procedures. 


Accession  For 
NTIS  GRAJfcl 

nnc  tab  q 

Unannounced  q 

Justification _ 


& 


By- 


Distribution/ 
j  Availability  CodeT 
|  Avail  and/or 

;  ^ 1  •'  '-Fecial 


The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC/  and  not  with  the  author  of  this  report. 


AN  INFERENTIAL  APPROACH  TO  THE  BIOASSAY  DESIGN  PROBLEM 


Ton  Leonard 


1.  INTRODUCTION 

Consider  indicator  variables  y.j,...,yn  and  explanatory  variables 
Xj,...,xn,  where  y^  is  one  or  zero,  according  to  the  death  or  survival  for 
the  ith  rat,  and  x^  denotes  the  dose  level  for  the  ith.  Having  observed 
experiments  on  n  rats  suppose  that  we  consider  the  design  problem  of  how  to 
choose  the  dose  level  xn+^  for  the  next  rat. 

Our  practical  procedures  are  based  upon  the  following  philosophy: 

(a)  The  choice  of  xfl+1  should  be  regarded  as  an  inferential  rather 
than  a  decision  theoretic  problem  i.e.  it  would  be  useful  to  provide 
diagnostic  devices  giving  guidelines  on  how  xn+i  should  be  chosen,  and 
permitting  input  from  the  user  in  relation  to  his  experience  and  intuition. 

In  particular,  a  graphical  method  summarizing  the  useful  information  in  the 
data,  rather  then  a  formal  decision,  based  upon  special  and  possibly 
constrictive  assumptions,  will  provide  the  user  with  a  better  understanding  of 
what  the  data  are  trying  to  say. 

(b)  The  more  obvious  features  of  the  information  in  the  data  concerning 
the  next  design  point  xn+j  My  be  extracted  by  considering  the  probability 
of  what  is  going  to  happen  next,  conditional  upon  what  has  already  happened, 
and  also  conditional  upon  the  various  feasible  choices  of  the  design  point. 
This  may  be  represented  by  a  graphical  plot  of  the  predictive  probability 


Sponsored  by  the  United  States  Army  under  Contract  No.  DAAG29-80-C-0041 


for  the  next  indicator  variable,  conditional  upon  (yj,...,yn;Xj,...,xn),  and 
the  next  design  point  x^j.  Reference  should  be  made  here  to  Geisser  (1971), 
who  views  the  predictive  distribution  as  summarizing  the  important  information 
in  the  data  concerning  the  next  observation. 

This  graphical  plot  will  be  superior,  for  design  purposes,  to  the  maximum 
likelihood  estimate  of  the  response  curve,  since  it  takes  account  of 
variability  of  the  parameter  estimates.  It  may  be  used  to  clarify  which  dose 
levels  will  be  useful  design  points  for  particular  LD  points  of  interest. 

(c)  The  next  level  of  complexity  is  to  consider  the  posterior 
distributions  of  the  effective  doses  for  the  ID  points  of  interest  since  these 
indicate  whether  more  experimentation  is  needed  in  order  to  be  adequately 
precise  about  these  regions  of  the  response  curve. 

(d)  Mixtures  of  the  posterior  distributions  of  the  effective  dose 
provide  useful  design  measures  which  may  be  used  to  ascertain  the  scatter  of 
the  next  few  design  points. 

The  technicalities  of  our  approach  will  be  Bayesian  in  spirit  and  the 
very  simple  methodology  will  provide  a  possible  alternative  to  the  full  dress 
sequential  Bayesian  decision  theoretic  ideas  of  Freeman (1970).  Existing  non- 
Bayesian  methods,  well  catalogued  by  Wetherill  (1966),  include  the  famous  "up 
and  down"  method  and  the  Robbins-Munro  Process. 

The  philosophy  outlined  in  this  paper  is  also  relevant  to  design  problems 
for  the  linear  statistical  model.  The  latter  requires  rather  different 


technicalities;  it  will  be  treated  in  detail  elsewhere 


2.  THE  PREDICTIVE  DESIGN  METHODOLOGY 


Consider  the  standard  linear  logistic  model  where  y^>y2f’  are 
independent,  given  corresponding  probabilities  where  9^  denotes 

the  probability  that  =  1 ,  and  the  logit  parameters 
*•  log  0^^  -  log(  1  -  0^)  satisfy 

«i  =  8q  .  (i  ■  1 , . . .  ,n)  .  (2.1 ) 

The  predictive  probability  in  (1.1)  could  be  calculated  by  choosing  a 

joint  prior  density  for  0^  and  8^  and  computing  the  expectation  of 

p(yn+1  *  1 1 &Q » 31 )  -  exp{0Q  +  01xn+1)/(1  +  exp{0Q  +  with  respect  to 

the  corresponding  joint  posterior  density  of  0Q  and  0^ .  This  method 

involves  two-dimensional  numerical  interpretations  together  with  the 

specification  of  a  prior.  However,  good  approximations,  when  there  is  little 

prior  information  about  0Q  and  @1 ,  may  be  obtained  by  taking  the  posterior 

T 

distribution  of  j|  ■  (0^,0^)  ,  after  n  observations ,  to  be  bivariate  normal 

A  A  A  _ 

with  mean  vector  equal  to  the  maximum  likelihood  vector  0 *  (0  , 0  )'4  and 

*  0  1 

covariance  matrix  C  equal  to  the  inverse  R  of  the  likelihood 

~n  ~n 

information  matrix 


(2.2) 


A  A  A  A  A 

where  8^^  »  exp{0Q  +  +  exp(0Q  +  is  the  fitted  maximum 

likelihood  probability  for  the  ith  rat. 


Hence,  for  any  fixed  the  posterior  distribution  of 

»  0g  +  8jxn+^»  given  the  first  n  observations,  is  approximately 

A  A 

normal  with  mean  vector  =  8g  +  ®iXn+l' 


and  variance 


(2.3) 


vn+1  "  (n22  “  2Vini2  +  Xn+inl1)/(n22ni1  ”  ni2} 


where  n*.  is  (j,lc)th  element  of  the  matrix  in  (2.2). 


The  predicted  probability  in  (1.1)  may  therefore  be  approximated  by 


♦*(ViJ 


-•  1  +  e 


a  ♦,*'WVl,a° 


(2*4) 


where  the  second  term  in  the  integrand  is  a  normal  density  with  mean  £ 


h+1 


and  variance  xn+1 .  Having  obtained  6  and  via  a  standard  numerical 

optimization,  may  be  calculated,  for  each  fixed  xn+1 ,  using  a  one- 

dimensional  numerical  integration.  This  result  is  exact,  given  the  adequacy 


of  the  approximate  posterior  normality  of  and  0^ ,  and  will  therefore  be 


more  accurate  than  standard  asymptotic  results  for  the  expectations  of 
nonlinear  functions. 


Note  that  the  predictive  probability  approximated  by  (2.4),  is 


also  the  posterior  mean,  after  n  observations,  of 
0n+1  -  exp{0Q  +  61xn+1  >/ [1  +  exp{0Q  +  ^x^}],  and  ♦  is  therefore  the 
Bayes  estimate  of  the  response  curve,  under  squared  error  loss.  The  posterior 
variance  of  0n+1  may  also  be  approximated,  by 


v(x 


n+1 


>-  / l-A;-  ♦*(«nt1)|2*(.,c, 


1  +  e 


n+1'Vn+1)do 


(2.5) 


and  it  is  useful  to  plot  the  standard  deviation  /v  as  a  function  of  xn+ir 
together  with  . 

A 

It  is  also  useful  to  compare  plots  of  and  where 


♦<Vi}  "  exp{8o  +  8iVi}/{1  +  exp(0o  +  eixn+i)} 


(2*6) 


<-*  ‘  Vi  <  •> 


The  $  function  gives  the  limiting  form  as  n  ♦  “  of  i.e. 

A 

$  and  <p*  would  be  identical  if  and  were  known  and  equal  to 

a  a 

f$Q  and  .  For  finite  sample  sizes  allows  for  uncertainty  about 

and  .  Therefore,  when  plotted  against  xn+i »  the  differences  between 

A 

the  two  curves  <J>*  and  4>  express  lack  of  information  in  the  data  about  the 
actual  response  curve.  Moreover,  the  curves  may  be  close  for  some  values  of 
xn+^ ,  and  very  different  for  other  values  of  xn+i •  This  permits  inferences 
about  which  LD  values  can  be  well  investigated  from  the  curve,  given  the 

A 

previous  design  points.  Mote  that  the  $  curve  is  always  steeper  them  the 
curve.  The  relative  steepness  provides  a  measure  of  the  information  in 
the  data,  for  example  if  4*  is  very  flat  then  this  suggests  that  there  is 
little  information  relating  to  any  LD  point. 

Suppose,  for  example,  that  the  user  is  particularly  interested  in 
obtaining  a  good  estimate  of  the  response  curve  for  LD  values  between  LD90  and 
LD99.  Then  the  next  dose  level,  or  levels,  could  be  selected  to  lie  between 
the  effective  doses  y_  and  y  ,  where  in  general  $*( y  )  «  p/100.  He 
might  continue  to  experiment,  repeatedly  using  this  selection  scheme,  until  he 
is  confident  (e.g.  according  to  the  criteria  discussed  in  the  next  paragraph) 
that  his  estimate  of  the  response  curve  is  reasonably  accurate  in  the  region 
of  interest.  This  procedure  should  be  regarded  as  inferential,  since  the  user 
may  modify  it  as  necessary  to  take  account  of  his  intuition. 

The  graphical  procedure  outlined  above  permits  a  number  of  informal 

A 

stopping  rules.  Suitable  measures  of  the  differences  between  the  and  t 

curves  include  (i)  the  area  between  the  curves  (ii)  the  area  between  the 
curves  and  the  and  y9g  dose  levels,  and  (iii)  the  differences  between 

A  A 

y9Q  and  y99  and  the  corresponding  values  y^  and  yg9  calculated  from 

A 

the  $  curve  i.e.  it  is  reasonable  to  stop  when  further  observations  would  be 


unlikely  to  affect  those  estimated  LD  points  which  are  of  interest  to  the 
user. 


3.  CHOOSING  THE  NEXT  r  DESIGN  POINTS 

As  well  as  basing  the  inference  for  design  upon  the  one- step  ahead 
(myopic)  predictive  probabilities,  it  is  in  principle  possible  to  consider  the 
r-step  ahead  probabilities 


♦*(ynt1 ' *  *  * *  *  * 'y«,xi' *  *  * 'x«*i > 


rn+r  1 


n  1 


n+1 


n+r  ®0yi+eixiyi  Wi 

E(n)  „  e  0  i  1  i  i/(1  +  e  0  1  i} 

B  i-n+1 


(3.1) 


where  the  expectation  operator  relates  to  the  posterior  distribution  of 
0Q  and  01 ,  given  y1 , . . . ,yn  and  x1,...,xn.  A  possible  approximation 
to  4*,  based  upon  standard  results  for  approximating  the  expectation  of  a 

nonlinear  function  in  terms  of  expectations  and  covariances,  is 

„  n+r  .  n+r  .  n+r  n+r 

4-  n  a  +  ^  (  n  a  )  l  l  w(x  ,x  )/a t  o  .  0.2) 

i-n+1  i-n+1  i-n+1  j-n+1  3  3 


where 


a. 


y, 

(4*(xi)}  x{i  -  4*(xi)} 


i-y« 


(3.3) 


and  the  predictive  covariance  w(x^,x^)  of  ®n+^  and  ®n+j  may  ^  computed, 
using  two  dimensional  numerical  integrations,  from 


w( x^ ,x^ ) 


*1X1 


B0  *  Mj 


S„  *  6ixi 


-  4*<Xj>) 


-  ♦*(xi>)(- 


e 


(3.4) 


The  expression  in  (3.1)  is,  after  time  n,  a  function  of  the  r  zero- 

one  random  variables  Yn+i* • • • »Yn+r  and  the  future  design  points 

x  ,  .  Therefore,  for  r  larger  than  2  or  3  it  will  be  a  bit  too 

unwieldy  to  yield  easy  interpretations.  However,  a  great  deal  of  the 

information  contained  in  this  joint  distribution  is  summarized  by  the 

predictive  means  $*(x  .  _),...,$* (x  .  )  together  with  the  predictive 

n+i  n+r 

covariances  in  (3.4).  When  i  =  j  the  latter  include  the  predictive 

variances  v(x  v(x  .  ),  obtained  from  (2.5). 

n+ i  n+r 

It  seems  that  not  too  much  interpretable  information  would  be  wasted  by 
just  referring  to  the  plots  of  the  and  /v  curves,  already  calculated 

for  the  one-step  ahead  design  problem.  Unless  the  user  is  confident  in  a 
special  decision  theoretic  criterion  which  he  wishes  to  combine  with  (3.1),  we 
recommend  that  he  does  just  this,  but  combined  with  the  more  detailed 
information  contained  in  the  posterior  distributions  of  the  effective  doses, 
discussed  in  the  next  section.  For  example,  if  he  is  interested  in  obtaining 
good  estimates  of  the  effective  doeses  for  the  LD50,  LD90,  and  LD99  points, 
then  he  could  carry  out  further  experiments  for  dose  levels  at  the  medians  of 
those  posterior  distributions  which  seem  insufficiently  informative. 

4.  THE  POSTERIOR  DISTRIBUTION  OF  THE  EFFECTIVE  DOSE 

Under  the  normal  approximation  to  the  posterior  distribution  of 
0Q  and  0^,  as  described  in  section  2,  it  is  possible  to  use  a  standard 
Jacobian  interpretation  method  to  obtain  the  posterior  density,  after  n 
observations,  of  the  effective  dose  corresponding  to  the  LO  p  point, 

for  any  p  lying  strictly  between  0  and  100.  This  is  evident  since,  with 
X  *  log  p  -  loc(100  -  p),  Yp  is  expressible,  in  terms  of  0Q  and  0^ ,  as 

Yp  -  (X  -  t0)/by  .  (4.1) 


-7- 


The  consequent  approximate  posterior  density  of  y  is  a  complicated 

P 

function  of  cumulative  normal  distribution  functions.  However,  another,  more 
interesting,  approach  is  available  by  reference  to  the  marginal  likelihood 
procedures  of  Sprott  and  Kalbfleisch  (1969). 

In  terms  of  the  exact  likelihood 

n  n  n  3  +B  x. 

*<£l*,x)  *  exP<*0  I  +  81  l  n  <1  +  •  >  <4-2> 

i-1  i=1  x  i=*1 


of  3_  and  3  ,  the  marginal  likelihood  of  y  is  defined  to  be 
0  1  p 


*<Y _Ix»£>  =  sup  Kj&ljC'JS*  (4*3) 

p  S 


where 


sp  -  {(S0.B,)  =  Yp-  <x-  S0)/6,l  (4.4) 

The  expression  if  (4.3)  may  be  calculated,  for  each  fixed  y^,  by 
replacing  3^  in 


A(VBilx'~ 


(4.5) 


exp  {A  l  y±  +  31  l  (xi  -  Y  Jy^/II  [1  +  exp{A  +  3^^  -  y  ) }] 


by  the  solution  for  3^  to  the  conditional  maximum  likelihood  equation 


A+3.  (x  -Y  ) 
.  1  1  P 


l  ,xi  -  V  x.»,<vV  *  |  (xi  -  Vyi 


(4.6) 


1  +  e 


which  may  be  solved  iteratively,  using  Newton-Raphson.  Note  that  the  solution 
for  3,  is  a  function  of  y  .  Therefore  substitution  for  3.  in  (3.5)  will 

i  p  1 

affect  this  expression  as  a  function  of  Yp»  This  means  that  the  marginal 
likelihood  will  be  a  non-obvious  function  with  a  wider  spread  than  obtained  by 

A 

replacing  3^  by  its  unconditional  maximum  likelihood  estimate  3^  • 


-8- 


Leonard  (1982a)  describes  a  general  result,  under  wide  regularity 

conditions,  which  shows  that  the  marginal  posterior  density  of  a  parameter, 

under  an  uninformative  prior,  is  well  approximated  by  its  marginal  likelihood 

when  the  latter  is  normalized  to  integrate  to  unity.  The  approximation  is 

indeed  almost  exact,  in  a  variety  of  special  cases.  Therefore  the  expression 

in  (4.3)  may  be  viewed  as  approximately  proportional  to  the  marginal  posterior 

density  of  Y  •  This  is  useful,  since  y  may  now  be  estimated,  say,  by  its 
P  P 

posterior  median,  as  an  alternative  to  the  raximum  likelihood  estimate,  and 

approximate  posterior  probability  statements  may  be  made  about  y^. 

Leonard  gives  a  refinement  to  this  approximation,  which  may  be  justified 

by  approximating  the  marginalization  of  the  joint  posterior  density  of  y^ 

and  81 ,  using  the  first  term  in  am  Edgeworth  expansion.  Hence  we  propose 

approximating  the  marginal  posterior  density  of  y  by 

P 

X+VW  _V2 

-  1 1  -  y2  — e  -  ■  2 )  fty* *>  ««•»» 

[i  ♦  •  '  1  p  ] 

where  i(y  |y,x)  is  the  marginal  likelihood  and  8.  is  the  solution  for  8, 

p  ~  i  1 

to  (3.6).  The  adjustment  term  in  (3.7)  is  based  upon  the  second  derivative  of 
the  log  of  the  expression  in  (3.5). 

Whilst  the  density  in  (3.7)  is  easy  enough  to  compute,  a  more  explicit 
approximation  is  available;  this  will  be  a  bit  less  accurate  numerically. 
Consider  the  multivariate  normal  approximation 

**(£lX,x>  =  **(£|X,x)exp{-  \  (£  -  £>\<j&  -  £>  >  (4.8) 

A  A  A  jj, 

where  jg  =  (8^,8^  is  the  standard  maximum  likelihood  estimator  of  j5,  and 
is  the  information  matrix  in  (2.2).  Under  this  approximation  the 
marginal  likelihood  in  (3.3)  and  the  density  in  (3.10)  may  be  respectively 
replaced  by 


(4.9) 


**<VX'5)  *  6XP{"  2  (®0  +  Vp  “  X)“/(°22Yp  +  2C11Yp  +  C11)} 


and 

»*<Yply,x)  «  (c22Yp  +  2c12Yp  +  c^f^iNYpljC^)  (4.10) 

where  is  the  (j,k)th  element  of  the  inverse  of  1^.  These  results 

illustrate  specific  functional  forms  in  terms  of  Yp* 

Note  however  the  interesting  point  _nat,  whilst  the  tails  of  the 

approximating  function  in  (4.10)  will  quickly  become  negligibly  close  to  zero 

as  | Yp I  becomes  large,  the  functions  will  not  theoretically  possess  finite 

integrals  over  -«•  <  Yp  <  "»  since  they  will  ultimately  behave  like  a 

constant  times  | Y  I  1  as  |y  I  ♦  “•  This  phenomenen  is  illustrated  by  the 
P  P 

thick  tails  in  Figure  3.  If  the  Jacobian/integration  method  indicated  in  the 
first  paragraph  of  this  section  had  been  performed  then  the  tails  would 
instead  be  Cauchy-like,  owing  to  the  normal  random  variable  in  the  denominator 
of  (4.1).  If  the  exact  (logistic)  likelihood  of  0^  and  0^  was  employed, 
together  with  non-informative  uniform  priors  for  0Q  and  B1  then  the  tails 
of  the  exact  marginal  posterior  density  of  Yp  would  be  slightly  thinner  than 

Cauchy,  and,  like  the  Cauchy,  would  possess  a  finite  integral  over 

\ 

-•  <  Y  <  "•  For  the  approximation  in  (4.10)  it  would  therefore  seem 
P 

reasonable  to  down  weight  the  tails  after  a  certain  point.  In  most  numerical 

examples,  the  situation  will  be  so  clear  cut  that  it  will  be  adequate  to  do 

this  graphically.  For  example,  for  the  posterior  density  of  the  medium 

effective  dose  in  Figure  3,  it  would  seem  reasonable  to  down  weight  the  tails 

outside  the  interval  (0.15,  0.4)  in  such  a  fashion  that  they  become  virtually 

equal  to  zero  at  Y  “  0.05  and  Y  *  0.5. 

P  P 


I 


5.  RELATED  WORK 


Ramsay  (1972)  and  Disch  (1981)  describe  a  non- parametric  Bayesian 
approach  where  the  prior  distribution  of  the  derivatives  of  the  whole  response 
curve  is  assumed  to  follow  a  Dirichlet  process.  This  yields  a  posterior 
estimate  for  the  response  curve  which  is  a  weighted  average  of  a  step  function 
and  a  prior  estimate.  It  would  be  possible  to  calculate  the  predictive 
probability  using  their  prior  assumptions.  Alternatively  Leonard  (1978, 

1982b)  describes  a  Gaussian  prior  distribution  across  function  space  for  a 
logistic  transform;  his  non-parametric  smoothing  procedure  could  also  be 
adapted  to  this  situation. 

For  design  purposes,  it  seems  easier  to  employ  the  simple  parametric 
assumption  in  (2.1).  This  yields  an  analysis,  free  from  choice  of  prior 
parameter,  and  giving  a  continuous  estimate  of  the  response  curve.  For 
example,  Disch  attempts  to  obtain  the  posterior  distribution  of  the  effective 
dose,  and  runs  into  substantial  technical  problems,  whereas  the  approximate 
densities  in  (3.7)  and  (3.9)  are  much  easier  to  calculate. 

In  practical  terms,  a  substantial  amount  of  zero-one  data  would  be  needed 
to  dispute  the  parametric  form  in  (2.1).  Moreover,  it  would,  in  such 
circumstances,  be  possible  to  use  a  slightly  more  general  parametric  from  e.g. 
permitting  skewness,  or  thicker  tails,  before  proceeding  to  a  full-dress  non- 
parametric  Bayesian  approach.  I  view  non-parametric  Bayes  as  useful  for 
investigating  a  hypothesized  model  when  a  moderate  amount  of  data  has  been 
collected;  for  design  problems  and  less  complete  amounts  of  data,  a  previously 
established  parametric  form  will  probably  work  better. 


6.  NUMERICAL  EXAMPLE 


The  data  in  Table  1  were  introduced  by  Bliss  (1952,  p.  540)  indicate  the 
fertility  or  non-fertility  of  57  rats  each  of  which  as  been  subjected  to  one 
of  five  dose  levels  of  a  drug.  The  dose  levels  are  described  as  fractions  of 
25  milligrams. 


Table  1 ;  Fertility  Experiment  for  57  Rats 


Dose  level 

0.15 

0.20 

0.25 

0.3 

0.4 

0.6 

No.  of  Rats 

5 

10 

10 

10 

11 

11 

No.  of  Fertile  Rats 

0 

2 

4 

8 

11 

11 

The  curves  in  Figure  1  relate  to  the  first  three  dose  levels  only,  whilst 
Figures  2  and  3  are  based  upon  the  whole  data  set. 

A 

The  4*  curve  in  Figure  1  differs  substantially  from  the  $  curve 
suggesting  that  more  data  needs  to  be  collected.  When  choosing  the  next 
design  point,  suppose,  for  example,  that  the  main  interest  lies  in  LD  points 
above  the  ID90  point.  Then  the  4*  curve  suggests  that  the  design  point 
should  be  chosen  above  0.47,  a  substantially  different  recommendation  than  the 

A 

value  of  0.44  suggested  by  the  cruder  4  curve.  If  interest  lay  in  the  LD99 
point  then  the  low  trajectory  of  the  4*  curve  suggests  that  substantially 
more  data  needs  to  be  collected.  It  seems  reasonable  to  start  off  with 

A 

several  more  design  points,  between  about  0.6  and  0.8.  The  4*  curve  and  4 
curve  are  closest  together  for  design  points  between  0.20  and  0.27  and  the 
standard  deviation  of  4*  dips  at  this  point.  This  suggests  that  the 
previously  chosen  design  points  0.15,  0.20,  and  0.25  are  best  for 
investigating  LD  points  just  above  the  LD20  point,  and  that  future  design 
points  will  need  to  be  more  widely  dispersed  if  it  is  required  to  get  a  good 
estimate  of  the  whole  curve.  If  for  example,  interest  lies  in  the  whole  curve 


between  the  Ii)10  point  and  LD90  point  then  future  design  points  should  be 
scattered  between  0.11  and  0.47.  Therefore  the  curve  effectively  defines 

the  scale  on  which  the  design  points  should  be  chosen. 

Moving  on  to  Figure  2,  we  see  that  with  the  increased  information  the 

a 

<|>*  curve  is  closer  to  the  $  curve.  Given  experience  with  the  method  the 
remaining  differences  would  help  us  to  judge  how  much  more  data  is  needed 
before  we  would  be  able  to  stop  the  experiment  on  the  grounds  that  and 

A 

<p  are  close  enough  for  our  purposes.  At  the  moment  the  curves  seem  just 
about  close  enough  to  stop  if  we  are  only  interested  in  good  estimates  between 

A 

the  LD10  point  and  the  LD90  points.  For  the  and  $  curves  these  LD 

points  correspond  to  design  points  within  the  respective  ranges  (0.16,  0.37) 
and  (0.17,  0.36)  which  are  fairly  close.  However,  if,  say,  interest  lay  in 

A 

the  LD99  point  then  it  seems  better  to  collect  more  data;  the  and  + 

curves  now  yield  effective  doses  of  0.54  and  0.47,  which  are  more  discrepant. 

Once  the  quantity  of  data  collected  gives  us  reasonable  accuracy  then  it 
is  useful  to  look  more  closely  at  the  posterior  distributions  of  various 
effective  doses,  in  order  to  substantiate  any  inferences  drawn  from  the 
predictive  curve.  The  densities  in  Figure  3  correspond  to  the  LD50,  U>90,  and 
LD99  points.  As  we  move  from  left  to  right  we  see  that  the  curves  become 
progressively  more  flat  and  progressively  more  skew.  For  a  given  required 
degree  of  accuracy  they  tell  us  which  effective  doses  can  be  accurately 
estimated  from  the  present  data.  The  spread  of  the  densities  gives  us  some 
idea  of  what  proportions  of  future  design  points  should  be  placed  at  different 
LD  points  of  interest. 

In  summary,  we  feel  that  the  curves  in  Figure  1,2,  and  3  provide  the  user 
with  most  of  the  information  necessary  to  make  suitable  inferences  about 
future  design  points.  A  formal  analysis  based  upon  special  assumptions  would 


conceal  much  of  this  information  and,  whilst  giving  conditional  optimality, 
would  restrict  the  user  from  making  a  practically  reasonable  choice  of  design 
point. 

7.  A  SENSIBLE  CHOICE  OF  DESIGN  MEASURE 

Bioassay  design  problems  are  of  course  very  different  from  design 
problems  for  the  linear  statistical  model,  where  the  optimal  design  points 
tend  to  stick  to  the  boundary  of  the  design  space.  An  obvious  design  measure 
in  the  present  nonlinear  situation  could  be  based  upon  a  prespecified  weight 
function  w(p)  indicating  the  relative  importance  of  LD  points  1,2,..., 99, 
where  £  w(p)  *  1.  The  derivative  of  our  recommended  design  measure  is  then 
the  mixture 

TtYlx#^  ■  l  *  <Ylj£/X)w(p)  (7.1) 

P  P 

where  *  is  the  posterior  density  of  the  effective  dose  y  .  Areas  under 
P  P 

the  curve  in  (5.1)  should  be  taken  to  indicate  the  proportions  of  future 
design  points  which  should  be  placed  in  the  corresponding  dose  level 
regions.  The  user  may  find  (5.1)  to  be  a  useful,  though  slightly  more  formal 
diagnostic  aid.  For  example,  if  he  views  the  importance  of  the  LD50,  LD90  and 
LD99  points  as  roughly  equal,  and  no  other  LD  point  to  be  of  interest,  then  he 
should  average  the  curves  in  Figure  3  and  place  his  design  points 
accordingly.  As  the  average  curve  is  a  density  there  will  be  no  problems  with 
any  boundary  of  the  design  space.  The  user  should  stop  sampling  when  the 
curve  in  (6.1)  is  sufficiently  spiked,  at  those  p  for  which  w(p)  >  0,  to 
guarantee  accurate  enough  estimation  of  the  effective  doses. 


Consider  now  the  situation  where  there  is  no  special  preference  for 
particular  design  points  and  the  objective  of  the  experiment  is  to  simply 
obtain  a  reasonable  estimate  for  the  whole  response  curve •  In  this  case  the 
choice,  w(p)  =  1/99  for  all  p,  seems  appropriate.  The  design  curve  in 
Figure  4  makes  this  assumption  for  the  data  in  Table  1 .  Note  from  Figure  3 
that  the  median  of  the  curve  in  Figure  4  is  close  to  the  posterior  median  of 
the  median  effect  dose.  The  design  curve  is  of  course  more  spread  out  than 
the  posterior  density  of  the  median  effective  dose. 

We  propose  this  choice  of  the  weights  w(p)  as  leading  to  a  reasonably 

objective  design  measure  which  should  be  appropriate  in  a  wide  range  of 

situations.  Rather  than  being  developed  from  an  optimality  criterion  it  can 

be  justified  purely  by  thinking  in  terms  of  probability  i.e.  (i)  for  a 

particular  IX)  point  n  is  the  obvious  design  measure,  since  dose  levels 

P 

chosen  according  to  this  measure  will  have  the  best  change  of  covering  the  LDp 
part  of  the  curve  (ii)  if  we  want  there  to  be  an  equal  chance  of  the  selected 
dose  levels  covering  LD  points  1,...,99  then  the  uniform  mixture  of  the 
»p's  is  clearly  appropriate;  if  we  do  not  want  this  then  the  mixing 
probabilities  should  be  adjusted  accordingly.  In  our  opinion,  criteria  like 
D-optimality  can  only  serve  to  confuse  the  issue. 


ACKNOWLEDGEMENTS 

The  author  wishes  to  thank  Chien  F.  Wu,  N.  Singpurwalla,  and  George  E.  P. 
Box  for  helpful  comments  and  suggestions.  Sponsored  by  the  United  States  Army 
under  Contract  No.  DAAG29-80-C-0041 . 


15 


REFERENCES 


Bliss,  C.  I.  (1952).  The  Statistics  of  Bioassay.  Academic  Press. 

Oisch,  D.  (1981).  Bayesian  non-parametric  inferences  for  effective  doses  in  a 
quantal  experiment.  Biometrics  37,  713-722. 

Freeman,  P.  R.  (1970).  Optimal  sequential  estimation  of  the  median  effective 
dose.  Biometrika  57,  79-81. 

Geisser,  S.  (1971).  The  inferential  use  of  predictive  distribution. 

Formulations  of  Statistical  Inference,  ed.  by  Godembe  and  Sprodt. 

Toronto:  Holt,  Rinehart  and  Winston,  pp.  456-466. 

Leonard,  T.  (1978).  Density  Estimation,  Stochastics,  Processes,  and  prior 
Information  (with  Discussion).  J.  Roy.  Statist.  Soc.  B.  46.  113-146. 

_  (1982a).  Comment  on  the  paper  by  Lejeune  and 

Faulkenberry.  J.  Amar.  Statist.  Assoc.  (June  issue,  to  appear) 

________________________  (1982b).  An  empirical  Bayesian  approach  to  the  smooth 

estimation  of  unknown  functions.  University  of  Wisconsin.  Mathematics 
Research  Center.  Technical  Report  No.  2339. 

Ramsey,  F.  L.  (1972).  A  Bayesian  approach  to  bioassay.  Biometrics  28, 
841-848. 

Sprott,  D.  A.  and  Kalbfleish,  J.  D.  (1969).  Examples  of  likelihoods  and 

comparison  with  point  estimations  and  larqe  sample  approximations.  J. 
Amer.  Statist.  Assoc.  64,  465-484. 

Wetherill,  G.  B.  (1966).  Sequential  methods  in  statistics.  Methuen. 


TVed 


-16- 


FIGURE  1.  PREDICTIVE  CURVES  FOR  DATA  AT  FIRST  3  DOSE  LEVELS 


-  CURVE 

-  $  CURVE 

-  ST.  DEV.  OF  CURVE 


FIGURE  3 


SECURITY  CLASSIFICATION  OF  THIS  PACE  (Whan  Data  Knlara« 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


2.  OOVT  ACCESSION  NOJ  3.  RECIPIENT'S  CATALOG  NUMBER 


REPORT  DOCUMENTATION  PAGE 


4.  TITLE  (ana  SuMtla) 


AN  INFERENTIAL  APPROACH  TO  THE  BIOASSAY 
DESIGN  PROBLEM 


S.  TYPE  OF  REPORT  A  PERIOD  COVERED 

Summary  Report  -  no  specific 
reporting  period 


6.  PERFORMING  ORG.  REPORT  NUMBER 


7.  AUTHORfA) 


t.  CONTRACT  OR  GRANT  NUMBERfaJ 


Tom  Leonard 


DAAG29-80-C-0041 


»■  PERFORMING  ORGANIZATION  NAME  ANO  ADDRESS 

Mathematics  Research  Center,  University  of 
610  Walnut  Street  Wisconsin 

Madison.  Wisconsin  53706 


II-  CONTROLLING  OFFICE  NAME  AND  ADDRESS 

U.  S.  Army  Research  Office 
P.O.  Box  12211 

Research  Triangle  Park,  North  Carolina  27709 


-  MONITORING  AGENCY  NAME  •  ADORES  Sffl  dUlarant  ham  Controlling  OHIea) 


10.  PROGRAM  ELEMENT.  PROJECT.  TASK 
AREA  A  WORK  UNIT  NUMBERS 

Work  Unit  Number  4  - 
Statistics  and  Probability 


12.  REPORT  DATE 

September  1982 


IS.  NUMBER  OF  PAGES 

16 


IS.  SECURITY  CLASS,  (ol  thla  raport) 

UNCLASSIFIED 


s.  distribution  statement  (oi  mi  a  Hap on) 

Approved  for  public  release;  distribution  unlimited. 


17.  DISTRIBUTION  STATEMENT  (ol  Ma  abaCrocf  antaaad  In  Black  20,  II  dlltorant  ham  Raport) 


KEY  WORDS  (Continue  on  remit  •  Ide  U  neceeemy  end  Identify  by  block  number) 


Bioassay,  Design,  Predictive  distribution.  Response  curve.  Posterior 
distribution  of  effective  dose,  Design  measure,  D-optimality 


DD  t  JAN ^73  1473  EDITION  OF  I  NOV  SS  IS  OBSOLETE 


UNCLASSIFIED _ 

SECURITY  CLASSIFICATION  OF  THIS  PAGE  (Whan  Data  Bntarad) 


ABSTRACT  (cont. ) 


^estimate  of  the  response  curve  leads  to  informal  stopping  rules.  Secondly, 
new  approximations,  to  the  posterior  density  of  the  effective  dose,  are 
proposed,  for  each  ID  value.  These  are  related  to  the  marginal  likelihood 
ideas  of  Sprott  and  Kalbfleisch.  Thirdly,  mixtures  of  these  densities  leads 
to  design  measures  for  the  distribution  of  future  dose  levels.  These  seem  to 
make  criteria  like  D-optimality  rather  tangential  to  the  real  design  issue. 
The  ideas  are  illustrated  graphically  by  reference  to  a  fertility  example  due 
to  Bliss. 


